learning disentangled representation
Learning Disentangled Representation for Robust Person Re-identification
We address the problem of person re-identification (reID), that is, retrieving person images from a large dataset, given a query image of the person of interest. The key challenge is to learn person representations robust to intra-class variations, as different persons can have the same attribute and the same person's appearance looks different with viewpoint changes. Recent reID methods focus on learning discriminative features but robust to only a particular factor of variations (e.g., human pose) and this requires corresponding supervisory signals (e.g., pose annotations). To tackle this problem, we propose to disentangle identity-related and -unrelated features from person images. Identity-related features contain information useful for specifying a particular person (e.g.,clothing), while identity-unrelated ones hold other factors (e.g., human pose, scale changes). To this end, we introduce a new generative adversarial network, dubbed identity shuffle GAN (IS-GAN), that factorizes these features using identification labels without any auxiliary information. We also propose an identity shuffling technique to regularize the disentangled features. Experimental results demonstrate the effectiveness of IS-GAN, largely outperforming the state of the art on standard reID benchmarks including the Market-1501, CUHK03 and DukeMTMC-reID. Our code and models will be available online at the time of the publication.
Learning Disentangled Representations of Videos with Missing Data
Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object, while it imputes each object trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons on a real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting.
Learning Disentangled Representations and Group Structure of Dynamical Environments Robin Quessard 1,2 Thomas D. Barrett 3 William R. Clements
Learning disentangled representations is a key step towards effectively discovering and modelling the underlying structure of environments. In the natural sciences, physics has found great success by describing the universe in terms of symmetry preserving transformations. Inspired by this formalism, we propose a framework, built upon the theory of group representation, for learning representations of a dynamical environment structured around the transformations that generate its evolution. Experimentally, we learn the structure of explicitly symmetric environments without supervision from observational data generated by sequential interactions. We further introduce an intuitive disentanglement regularisation to ensure the interpretability of the learnt representations. We show that our method enables accurate long-horizon predictions, and demonstrate a correlation between the quality of predictions and disentanglement in the latent space.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Review for NeurIPS paper: Learning Disentangled Representations and Group Structure of Dynamical Environments
While the rebuttal did address some of my concerns, I can not raise further. Especially, since I would still like to see an experiment analysis added on a standard benchmark where the proposed method **fails** (perhaps this is the case for the promised experiments on 3D cars or 3D shapes, but this is not clear from the text). In this way, it should be easier for others to follow-up on this work. I also recognize the scalability issues of the proposed method as pointed out by R2 and R5, which I initially had not considered. I agree that this is an issue that should be discussed in the paper and ideally computational complexity is empirically analyzed. However, considering that the field of disentanglement is still rather nascent and mostly concerned with synthetic datasets and overengineered methods, I don't think this is reason for rejection or a lower score.
Reviews: Learning Disentangled Representation for Robust Person Re-identification
This paper describes an approach to person re-identification that uses a generative model to effectively disentangle feature representations into identity-related and identity-unrelated aspects. The proposed technique uses an Identity-shuffling GAN (IS-GAN) that learns to reconstruct person images from paired latent representations even when the identity-specific representation is shuffled and paired with identity-unrelated representation from a different person. Experimental results are given on the main datasets in use today: CUHK03, Market-1501, and DukeMTMC. The paper is very well-written and the technical development is concise but clear. There are a *ton* of moving parts in the proposed approach, but I feel like the results would be reproducible with minimal head scratching from the written description.
Reviews: Learning Disentangled Representations for Recommendation
However, the whole framework makes sense to me, and the use of Gumbel-softmax trick and cosine similarity is also reasonable. MultDAE) in Figure 2, so that we can see the comparison. As learning such an item representation (distinguished by category, like clustering) is not hard. The micro disentanglement (Figure 3) is interesting, but the quantitative measurement is missing. Maybe the proposed macro-micro structure alleviates the data sparsity problem in some way?